Method for detecting a signal pause between two patterns which are present on a time-variant measurement signal using hidden Markov models

ABSTRACT

The method recognizes a signal pause between two patterns that are present in a time-variant measurement signal and that are recognized using hidden Markov models. In a first signal processing stage, feature vectors are formed periodically for pattern recognition, which describe a signal curve of a measurement signal within a time slice. No speech pause is detected by a pause detector contained therein in a first time slice based on present features of a first feature vector. In a second signal processing stage, in a second time slice that follows the first time slice the first feature vector is compared with at least two hidden Markov models, of which at least one has been trained to a pattern to be recognized and another has been trained to a pattern characteristic for a pause. If in the comparison of the first feature vector with the hidden Markov models, a greater probability results for the presence of a pause, pause information concerning the presence of a pause, the pause information, is forwarded to a pause detector in the first signal processing stage. The measurement signal is treated as a signal pause, at least in the second time slice.

BACKGROUND OF THE INVENTION

In many technical processes, pattern recognition acquires increasedimportance, since an increasing degree of automatization can thereby beachieved. Pattern recognition processes can as a rule be reduced to atime-variant measurement signal derived in a suitable way from thepatterns to be recognized. However, in the automatic analysis of thismeasurement signal the problem arises that these measurement signals arenot present in pure form, but rather are overlaid with stationary ornon-stationary disturbing signals. In the examination of measurementsignals derived from naturally uttered speech, these disturbing portionsof the measurement signal are for example caused by background noises,breathing noises, machine noises, or also by the recording medium andthe transmission path. Since the measurement signal is never present inpure form, it is particularly important to distinguish between theportions of the measurement signal containing the pattern to berecognized and other portions in which no pattern is present. For thebetter recognition of the patterns, it is thus particularly important toknow exactly when patterns are present in the measurement signal andwhen no patterns, i.e. signals not resulting from the pattern arepresent as pause signals in the measurement signal.

A pause detection is for example also important in order to achieve areduction in the quantity of the transmitted data, for example in speechcommunication channels and also in satellite transmission, for generaldistinguishing of useful signal from disturbing signal in signalprocessing, or else to find the end of an expression in the automaticspeech recognition system. A robust pause detector thereby serves forthe improvement of the efficiency of speech-controlled systems. Thisholds in particular for speech recognition systems, since what isconcerned there is the comparison of a spoken expression as a patternwith an already-existing version. The problematic of pause determinationspecifically in automatic speech recognition has been describedextensively by Rabiner (L. R. Rabiner and M. Sambur (1995), "AnAlgorithm for Determining the Endpoints of Isolated Utterances", TheBell system Technical Journal, 54(2), pages 297-315). He has alsoindicated an algorithm for pause detection. There, for pause detectionitems of information are taken into account that are calculated directlyfrom the sampled time signal (energy, zero crossing rate, etc.). Thisprocedure is common to all known pause detectors (J. H. Hansen, "SpeechEnhancement Employing Boundary Detection and Morphological BasedSpectral Constraints", IEEE International Conference On Acoustics,Speech and Signal Processing, pages 901-904, Toronto, ICASSP). As arule, they use a more or less complicated control apparatus to carry outthe classification of the pauses from the calculated features. As analternative, statistical classifiers have also been used (H.Katterfeldt, "Sprachbestimmung mit Polynom Klassifikatoren", ProceedingsMustererkennung 7, DAGM-Symposium, Erlangen, pages 180-184). Due to thisprocedure, all these methods can operate only up to a certaindisturbance level. The limit depends on the type of disturbance. Theycan no longer be used with small signal-noise ratios, since as a rulepause detectors are threshold-controlled. However, given very low signalto noise ratios, in environments with disturbances the current decisioncriteria with thresholds fail. In addition, there are non-stationarydisturbances with a character similar to a signal, which can hardly bedetected.

Previous approaches to the determination of speech pauses use e.g. alocal parameter, i.e. one obtained on the basis of a temporal or,respectively, spectral item of frame information, for the detection ofsignal or, respectively, non-signal regions (S. Boll, (1979),"Suppression of Acoustic Noise In Speech Using Spectral Subtraction",IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.ASS-27, No. 2, pages 113-120; and B. Widrow et al, (1975), "AdaptiveNoise Cancelling: Principles and Applications", Proceedings of the IEEE,63 (12), pages 1692-1716). Works on this subject published more recentlyare also primarily based on modifications or expansions of these works.Further procedures for pause recognition in time-variant signals are notknown.

SUMMARY OF THE INVENTION

The underlying aim of the invention is to indicate an improved methodfor pause recognition between patterns that are present in a measurementsignal and that were modeled using hidden Markov models.

In general terms the present invention is a method for recognizing asignal pause between two patterns that are present in a time-variantmeasurement signal and that are recognized using hidden Markov models.In a first signal processing stage, feature vectors are formedperiodically for pattern recognition, which describe the signal curve ofthe measurement signal within a time slice. No speech pause is detectedby a pause detector contained therein in a first time slice on the basisof present features of a first feature vector. In a second signalprocessing stage, in a second time slice that follows the first timeslice, the first feature vector is compared with at least two hiddenMarkov models, of which at least one has been trained to a pattern to berecognized and another has been trained to a pattern characteristic fora pause. If, in the comparison of the first feature vector with thehidden Markov models, a greater probability results for the presence ofa pause, the information concerning the presence of a pause, the pauseinformation, is forwarded to the pause detector in the first signalprocessing stage. There the measurement signal is treated as a signalpause, at least in the second time slice.

Advantageous developments of the present invention are as follows.

A defined sequence of patterns, a pattern sequence, can be recognized.The pause information is forwarded after the recognition of the patternsequence over several time slices, so that in the first signalprocessing stage, at least in the time slice following the patternsequence, the measurement signal is treated as a signal pause and not asa pattern to be recognized.

Many feature vectors are intermediately stored until a pattern sequencehas been recognized. The pause information is forwarded after therecognition of the pattern sequences, so that in the first signalprocessing stage, at least in the time slice before the patternsequence, the measurement signal is treated as a signal pause and riotas a pattern to be recognized.

Characteristics of the measurement signal are evaluated in the timedomain in the first signal processing stage for pause recognition.

Characteristics of the measurement signal are evaluated in the spectraldomain in the first signal processing stage for pause recognition.

Context-modeled hidden Markov models are used.

The measurement signal represents uttered speech.

Disturbances in the feature extraction stage of a speech processingsystem are suppressed.

A channel adaptation of a speech channel is carried out.

The measurement signal represents writing motions on a pad.

The measurement signal represents signal sequences of a message-orientedsignaling method.

An advantage of the inventive method is that for the first time items ofinformation that are obtained in different signal processing stages andthat occur successively in time are used for pause detection. That is,the pause information is obtained by comparing a specific pause modelwith the feature vector of the measurement signal in a comparison stage,and is supplied back to the feature extraction stage of the patternrecognition, so that, in a further time slice in the feature extractionstage, the pause state can be taken into account in the measurementsignal analysis.

The inventive method advantageously makes use of the information thatcertain pattern groups belong with one another, e.g., for words theseare groups of phoneme patterns; in this way it is ensured that a pausemust follow at least after the pattern group. This information issubsequently used advantageously in the feature extraction stage as thefirst processing stage of the method.

Advantageously, it is also ensured by the inventive method that a pausehas to have occurred before the arrival of a pattern sequence to berecognized. This fact is likewise exploited during the patternrecognition.

Advantageously, the inventive method can be combined with known methodsfor pause recognition that evaluate characteristics of the measurementsignal in the time domain and in the spectral domain. In this way, ahigher detection rate can be achieved in the pattern recognition.

With the inventive method, speech patterns, writing patterns orsignaling patterns can be particularly advantageously analyzed, sincethey occur in numerous technical applications and can be modeled insuitable fashion.

With the inventive method, it can be advantageously ensured that if nopatterns are recognized a pause must be present; in this way, anincreased detection rate is achieved in the pattern recognition, sincean item of pause information can thereby be made available to thefeature extraction stage even more reliably.

In the following, the invention is further explained on the basis offigures.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the present invention which are believed to be novel,are set forth with particularity in the appended claims. The invention,together with further objects and advantages, may best be understood byreference to the following description taken in conjunction with theaccompanying drawings, in the several Figures of which like referencenumerals identify like elements, and in which:

FIG. 1 shows a schematized example of a speech recognition systemequipped with pause recognition.

FIG. 2 illustrates the pause recognition process on the basis of varioushidden Markov models.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows on the basis of an example, realized here as a speechrecognition system, how the pause information is detected and forwarded,i.e. conducted back, according to the inventive method. The measurementsignal, here as the speech signal Spr, first goes into a featureextraction stage Merk, which corresponds to the first signal processingstage in the inventive method. In this first signal processing stage,the spectral features of the speech signal or, respectively, of themeasurement signal Spr are standardly analyzed. These features, whichare subsequently outputted by the feature extraction stage, are heredesignated with m in FIG. 1. Next, the spectral features m go, e.g. asfeature vectors, into a classification stage Klass, in which they arecompared with the hidden Markov models HMM. The inventive method nowbegins here, by comparing the feature vectors obtained from themeasurement signals in specific hidden Markov models for individualphonemes or, respectively, for pause states. In the training phase ofthe hidden Markov models, for example typical feature vectors areestimated for the background noise, as is also done for the usefulsignal. In this way, it is possible that in a continuous patterncomparison in each interval of analysis, the useful signal and the noisesignal can be distinguished. In case of a very poor signal-noise ratio,a still higher robustness is achieved

a) by means of a common evaluation of many analysis intervals and

b) by means of a recognition of the useful signals, whereby all signalsthat are not recognized as the useful signal can be allocated e.g. tonoise. The invention can advantageously be used in all known patternrecognition methods and can be combined with it. The inventive method isbased in particular on the fact that the signal states and the featurevectors do not alter excessively from one time slice of the analysisinterval to the next. In this way, an item of information obtained inthe classification stage Klass can be forwarded to the featureextraction stage as pause information Pa, by determining e.g. that inthe comparison of the hidden Markov models there is a higher probabilityfor a pause than for a pattern to be recognized. It is highly probablethat the time slice in which the pause is detected will be followed by afurther time slice with a pause. By means of this procedure, undesireddisturbances in the measurement signal can be suppressed in theformation of the feature vectors with great certainty, even with a lowsignal-noise ratio. Advantageously, by means of the inventive method theknowledge present in the recognition stage in a second time sliceconcerning the pause is transmitted to a first signal processing stage.This knowledge can for example be obtained from a speech signal via theacoustically phonetic modeling stage (hidden Markov models), which werealready trained for speech recognition with a set of training data. Inphoneme-based systems, the pause is trained at the same time as a modelof a phoneme, and thus includes the statistics of the training data.More refined, and thus better, is the modeling taking into account thephoneme context, i.e. the knowledge of which phoneme follows another.If, for example, the pause decision of the acoustically phoneticmodeling stage is combined with current criteria for pause estimation,an improvement of the pause decision can be achieved.

FIG. 2 shows the different Viterbi paths V1 to V3 for different hiddenMarkov models. Here the connection between the pattern recognition andthe presence of a pause between different patterns is shown over time.First the measurement signal, which is for example a speech signal, awriting signal or a signal emitted by signaling methods, is transformedinto a feature vector space via a suitable signal transformation orseveral signal transformations. In a training phase of the patternrecognition method, typical models are for example estimated for thebackground noise and also for the useful signal, which are subsequentlyto be used in the recognition method. For the inventive method, thetraining can for example be realized using the method of the hiddenMarkov models. However, the pause recognition method can likewise becarried out with other pattern recognition methods, such as for exampledynamic programming or neural networks. If hidden Markov models are usedin the inventive method, then among other things the distributionfunctions of the feature vectors can for example be estimated for eachrecognition unit. In this connection, recognition units refers to speechsounds (phonemes) in automatic speech recognition. The inventive methodwas realized for automatic speech recognition by way of example, but itis conceivable that it can be used for any type of pattern recognition.It need only be ensured that signal patterns can be provided and thatpause states are present in which the disturbing signals can bedetermined in order to train the hidden Markov models for pause states.Some examples of this sort for other pattern recognition methods includefor example the patterns that occur in the signing of a document in theform of pressure- or time-dependent writing signals, or signal sequencesthat are used in automatic message-oriented signaling methods.

In the execution of the inventive method, in the recognition phase acontinuous pattern comparison can for example calculate the probabilityof production for each recognition unit in each analysis interval, or,respectively, in each time slice. A simple solution is the evaluation ofthese probabilities. If the probability for a pause, thus, for thehidden Markov model, for a pause or the equivalent thereof, is at itshighest, then the analysis interval concerned can be used for the newestimation of the distribution functions or for filtering out, given anoise suppression.

The inventive method becomes still more robust if the result of apattern recognizer is taken into account as an additional source ofknowledge. If it is presupposed that for example the pattern recognizeris able to recognize every possible useful signal, the inventive methodcan make use of this and can define as pause all other analysisintervals not classified as useful signal. Such a time segment isdesignated with T_(p) in FIG. 2. If there is no demand for real-timeprocessing in relation to the method, as is the case for example insimulations, the inventive method can hereby already count as sufficientfor the pattern recognition. In practice, real-time criteria are to beused in the applications mentioned, and an allocation to the usefulsignal or noise signal must ensue as soon as possible. The method mustthus for example be integrated into the recognition process itself. Therecognition method is thus expanded according to the invention in such away that after each analysis step it is for example evaluated which ofthe patterns, e.g. words, composed from the recognition units is themost probable. In addition, over a larger analysis interval theprobability that this interval contains a signal pause is for examplecalculated. For example, the analysis interval is thereby dimensioned insuch a way that in every case it is longer than short pauses, e.g.plosive pauses in the useful signal. This probability is then comparedwith that of the most probable pattern, whereby it is related to anequally long time interval. The result of this comparison can already beused as a decision.

Still higher demands are for example placed on speech recognitionsystems. In them, it must be avoided that the recognizer shuts offprematurely, thereby causing the output of a false word. In FIG. 1, therecognizer is designated Klass. These cases occur in particular withnon-stationary disturbing noises. This can for example be prevented byan additional condition. For example, a signal pause is recognized asthe end of a word only if, in addition to the criterion described above,the most probable word over a determined time span has always been themost probable word. This time span is designated T_(ST) in FIG. 2.Through the combination of these two described criteria, a highreliability is obtained in pause recognition, which is important for thesure functioning of a speech recognizer.

The basic idea is, in a pattern recognition system, to exploit theknowledge sources present on different levels in signal processingstages for the detection of a pause. These extend for example to:

characteristics of the signal in the time domain, such as for examplezero crossing rate and level, as well as

in the spectral domain, e.g. the power and the measure of correlation,including the logarithmic and/or feature domain.

in addition, the inventive method detects the pause by realizing afeedback of the recognition stage to the feature extraction stage.

In this way, the information present in the various time slicesconcerning the presence of a pause in the classifier Klass is suppliedto the feature extraction stage Merk. During the recognition, thereensues for example a dynamic pattern comparison, in which an allocationto the pre-trained models is made on the basis of the feature vectors inan analysis window or, respectively, in a time slice. A global searchstrategy, such as is realized e.g. by the Viterbi algorithm, finds themost probable sequence of pre-trained model states that reproduces theincoming sequence of feature vectors (L. R. Rabiner et al, (1986), "AnIntroduction to Hidden Markov Models", IEEE Transactions on Acoustics,Speech and Signal Processing, (1), pages 4-16).

Thus, in each time window the information about pause/non-pause can bepicked off at the classifier Klass, and can be supplied to a pausedetector in another stage. In the inventive method, this is for examplerealized in such a way that in the classifier a specific hidden Markovmodel for pause is compared with the incoming feature vectors; if ahigher probability for pause occurs than for other patterns, a pauseinformation signal is for example forwarded to the feature extractionstage Merk, and there leads to the decision that a pause is currentlypresent. That is, with this pause information a pause detector alreadypresent in the extraction stage can also be controlled to set pause.This pause decision can for example be probability-weighted, and isbased on a decision that takes into account other sources of knowledgewithin the inventive method. Such other knowledge sources include forexample statistics of the measurement signal and the phoneme contextfrom the Viterbi method. Based on the sequential structure of arecognizer, e.g. the delay by an analysis window must be taken intoaccount, for example in a feeding back of the information to a pausedetection stage for the suppression of disturbing noises. If, in speechrecognition, the pause decision of the acoustically phonetic modelingstage is connected with current criteria for pause estimation, animprovement of the pause decision can be achieved. For example, if theframe-by-frame detection of the pauses is completely abandoned, afurther knowledge source in the recognition system can be exploited forthe pause estimation.

For example, different patterns that are connected and that also belongtogether can be detected as a whole, and conclusions can be drawntherefrom concerning the pauses present in the measurement signal. Forexample, such a global pause detector can provide its information aboutthe entire pattern or pattern sequence to be recognized. In the case ofspeech recognition, such a pattern sequence would be for example a wordto be recognized. All regions outside this pattern sequence can thus forexample be recognized as pause. This has the advantage that even currentdisturbances go into the pause detection. The inventive method thusstill functions even at very high disturbance levels, and is thus morerobust. As a result of the design, a larger time delay is to be allowedfor before a decision is present. This global pause detection stage isthus to be used particularly in connection with an intermediate signalstoring. It is particularly suited for the preparation of themeasurement signal, and can in particular serve for the recognition ofthe separation pauses between individual words or, respectively,sequences of patterns to be recognized. An inventive system for patternrecognition and pause recognition can be described in summary fashion inthe following stages.

1. Taking into account of the signal characteristics in the time domain(e.g. zero crossing rate, level);

2. Additional taking into account of the characteristics in the spectraldomain (e.g. power, correlation measure), including the logarithmicand/or feature region;

3. Additional taking into account of the frame-by-frame patterncomparison with pre-trained pause models;

4. Additional taking into account of the feedback of the decision of thepause detector integrated into the global recognition.

For example, an embodiment of the inventive method is described by thepseudo-code shown in Table 1.

                  TABLE 1                                                         ______________________________________                                        main()                                                                        do                !Time loop                                                  signal.sub.-- analysis()                                                                        !Transformation of the                                                        !measurement signal into a                                                    !feature region                                               calculate.sub.-- word.sub.-- pb()                                                           !calculates the probability for each                                          !reference word, e.g. with hidden                                             !Markov models and Viterbi decoding;                                          !this is the composite probability                                            !that all previous feature vectors                                            !were emitted by the respective word                                          !model                                                          calculate.sub.-- pause.sub.-- pb()                                                          !calculates the probability for                                               !pause for the last P time                                                    !steps; this is the composite                                                 !probability that the last P                                                  !feature vectors were emitted by                                              !the model for `Pause`                                          pausedetector()                                                                             !sets pause to 1, if the                                                      !probability for pause is higher                                              !than for the best word,                                                      !otherwise pause = 0                                                          !Thereby standardization of the                                               !probabilities to the same time                                               !duration P                                                   if(pausw&&word.sub.-- stable > x)break                                                        !Abort, if pause is recognized                                                !by pausedetector() (pause) and                                               !the best word at least since x                                               !magazines [sic:"time steps" ]                                                !uninterrupted is the best                                                    !(word .sub.-- stable)                                          enddo                                                                         output()      !output recognized word                                       end                                                                           ______________________________________                                    

By way of example, the inventive method is realized in a main programthat is bounded by main and end. This main program essentially containsa do loop as a time loop. A transformation of the measurement signalinto a feature region is carried out with a procedure signal₋₋ analysis.For example, a specific time slice of the measurement signal is analyzedand feature vectors from this time slice are applied.

The applied feature vectors are subsequently analyzed in a subroutinecalculate-word pb. For example, there the probability is calculated foreach reference word, e.g. with hidden Markov models and using Viterbidecoding. The composite probability that all previous feature vectorswere emitted is thereby calculated. In an additional subroutinecalculate₋₋ pause₋₋ pb, the probability for pause is calculated for thelast P time steps. Here as well, the composite probability is calculatedthat the last P feature vectors were emitted by the model for pause. Ina further subroutine pause detector, a pause information signal isgenerated if the probability for pause is higher than for the best word;otherwise the pause information is not produced. For example, astandardization of the probability to be taken into account to the sametime duration P is carried out here. In a further query, if (pause &&word₋₋ stable>x) break, an abort of the method is carried out if pausehas been recognized by the pause detector, and the best word at leastsince x time steps uninterrupted is stable (word₋₋ stable). With thesubroutine output, the recognized pattern sequence, a word in the caseof speech recognition, is outputted.

The invention is not limited to the particular details of the methoddepicted and other modifications and applications are contemplated.Certain other changes may be made in the above described method withoutdeparting from the true spirit and scope of the invention hereininvolved. It is intended, therefore, that the subject matter in theabove depiction shall be interpreted as illustrative and not in alimiting sense.

What is claimed is:
 1. Method for recognizing a signal pause between twopatterns that are present in a time-variant measurement signal and thatare recognized using hidden Markov models, comprising the steps of:a)periodically forming in a first signal processing stage, feature vectorsfor pattern recognition, which describe a signal curve of a measurementsignal within a time slice, no speech pause being detected by a pausedetector contained therein in a first time slice based on presentfeatures of a first feature vector; b) comparing the first featurevector, in a second signal processing stage, in a second time slice thatfollows the first time slice with at least two hidden Markov models, ofwhich at least one has been trained to a pattern to be recognized andanother has been trained to a pattern characteristic for a pause; c)forwarding, if in the comparison of the first feature vector with thehidden Markov models, a greater probability results for the presence ofa pause, pause information concerning the presence of a pause to a pausedetector in the first signal processing stage, and therein treating themeasurement signal as a signal pause, at least in the second time slice.2. The method according to claim 1, wherein a defined sequence ofpatterns is recognizable, and wherein the pause information is forwardedafter recognition of the pattern sequence over several time slices, sothat in the first signal processing stage, at least in a time slicefollowing the pattern sequence, the measurement signal is treated as asignal pause and not as a pattern to be recognized.
 3. The methodaccording to claim 2, wherein feature vectors are intermediately storeduntil a pattern sequence has been recognized, and wherein the pauseinformation is forwarded after recognition of the pattern sequences, sothat in the first signal processing stage, at least in a time slicebefore the pattern sequence, the measurement signal is treated as asignal pause and not as a pattern to be recognized.
 4. The methodaccording to claim 1, wherein characteristics of the measurement signalare evaluated in the time domain in the first signal processing stagefor pause recognition.
 5. The method according to claim 1, whereincharacteristics of the measurement signal are evaluated in the spectraldomain in the first signal processing stage for pause recognition. 6.The method according to claim 1, wherein the Markov models arecontext-modeled hidden Markov models.
 7. The method according to claim1, wherein the measurement signal represents uttered speech.
 8. Themethod according to claim 7, wherein disturbances in a featureextraction stage of a speech processing system are suppressed.
 9. Themethod according to claim 7, wherein a channel adaptation of a speechchannel is carried out.
 10. The method according to claim 1, wherein themeasurement signal represents writing motions on a pad.
 11. The methodaccording to claim 1, wherein the measurement signal represents signalsequences of a message-oriented signaling method.